Focus Annotation of Task-based Data: Establishing the Quality of Crowd Annotation
نویسندگان
چکیده
We explore the annotation of information structure in German and compare the quality of expert annotation with crowdsourced annotation taking into account the cost of reaching crowd consensus. Concretely, we discuss a crowd-sourcing effort annotating focus in a task-based corpus of German containing reading comprehension questions and answers. Against the backdrop of a gold standard reference resulting from adjudicated expert annotation, we evaluate a crowd sourcing experiment using majority voting to determine a baseline performance. To refine the crowd-sourcing setup, we introduce the Consensus Cost as a measure of agreement within the crowd. We investigate the usefulness of Consensus Cost as a measure of crowd annotation quality both intrinsically, in relation to the expert gold standard, and extrinsically, by integrating focus annotation information into a system performing Short Answer Assessment taking into account the Consensus Cost. We find that low Consensus Cost in crowd sourcing indicates high quality, though high cost does not necessarily indicate low accuracy but increased variability. Overall, taking Consensus Cost into account improves both intrinsic and extrinsic evaluation measures.
منابع مشابه
Focus Annotation of Task-based Data: A Comparison of Expert and Crowd-Sourced Annotation in a Reading Comprehension Corpus
While the formal pragmatic concepts in information structure, such as the focus of an utterance, are precisely defined in theoretical linguistics and potentially very useful in conceptual and practical terms, it has turned out to be difficult to reliably annotate such notions in corpus data (Ritz et al., 2008; Calhoun et al., 2010). We present a large-scale focus annotation effort designed to o...
متن کاملImplementation and Optimization of Annotation and Interpretation Step of Next-Generation Sequencing Data for Non-Syndromic Autosomal Recessive Hearing Loss
Introduction: The precision and time required for analysis of data in next-generation sequencing (NGS) depends on many factors including the tools utilized for alignment, variant calling, annotation and filtering of variants, personnel expertise in data analysis and interpretation, and computational capacity of the lab and its optimization is a challenging task. Method: An application software...
متن کاملTags Re-ranking Using Multi-level Features in Automatic Image Annotation
Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملFuzzy Neighbor Voting for Automatic Image Annotation
With quick development of digital images and the availability of imaging tools, massive amounts of images are created. Therefore, efficient management and suitable retrieval, especially by computers, is one of themost challenging fields in image processing. Automatic image annotation (AIA) or refers to attaching words, keywords or comments to an image or to a selected part of it. In this paper,...
متن کامل